Facilitating Trust on Data through Provenance
نویسندگان
چکیده
Research on trusted computing focuses mainly on the security and integrity of the execution environment, from hardware components to software services. However, this is only one facet of the computation, the other being the data. If our goal is to produce trusted results, a trustworthy execution environment is not enough: we also need trustworthy data. Provenance of data plays a pivotal role in ascertaining trustworthiness of data. In our work, we explore how to use state-ofthe-art systems techniques to capture and reconstruct provenance, thus enabling us to build trust on both newly generated and existing data. 1.1 Motivation Provenance is a record that describes the sources and agents involved in producing a piece of data [6]. This record can be analyzed e.g. to understand if data conforms designated standards or to calculate a level of trust on the data in order to assist decision making. Thus, knowing the provenance of data can play a central role in the trust we put on them. On the other hand, not having any provenance information on our data could undermine the benefits of using a trustworthy execution environment: if we cannot trust the data we process, we will also be unable to trust the produced results. 1.2 Capturing Provenance Through Dynamic Instrumentation We have developed a new system called DataTracker [7] which uses Dynamic Taint Analysis (DTA) to capture high-fidelity provenance from unmodified programs. DataTracker is based on Intel Pin Dynamic Binary Instrumentation framework and a modified version of the libdft [4] library which provides a reusable framework for Dynamic Taint Analysis. The architecture of DataTracker is depicted in Fig. 1a. Its main components are a Pin tool and a converter written in Python. The former generates provenance information in raw format which are converted to the W3C PROV format [6] by the latter. After converting to PROV, existing tools can be used to further process and visualize the provenance. Fig. 1b shows the provenance graph produced for a simple grep-like utility. DataTracker attributes the output to only two of the four input files, which is 1 Source code available on: http://github.com/m000/dtracker 2 https://software.intel.com/articles/pintool Raw Provenance Information PROV file (Turtle RDF) Application Image Linux Kernel Application Process Raw to PROV Converter file A file B dtracker libdft.a C++ STL Pin VM Pin Instrumentation API Code Cache Syscall/Event Dispatcher
منابع مشابه
A Provenance-Based Trust Model for Delay Tolerant Networks
Managing trust efficiently and effectively is critical to facilitating cooperation or collaboration and decision making tasks in tactical networks while meeting system goals such as reliability, availability, or scalability. Delay tolerant networks are often encountered in military network environments where end-to-end connectivity is not guaranteed due to frequent disconnection or delay. This ...
متن کاملTrust Evaluation through User Reputation and Provenance Analysis
Trust is a broad concept which, in many systems, is reduced to reputation estimation. However, reputation is just one way of determining trust. The estimation of trust can be tackled from other perspectives as well, including by looking at provenance. In this work, we look at the combination of reputation and provenance to determine trust values. Concretely, the first contribution of this paper...
متن کاملA Software Framework for Data Provenance
Data provenance refers to the historical record of the derivation of the data, allowing the reproduction of experiments, interpretation of results and identification of problems through the analysis of the processes that originated the data. Data provenance contributes to the evaluation of experiments. This paper presents a framework for data provenance using the W3C provenance data model, call...
متن کاملPROTRU: Leveraging Provenance to Enhance Network Trust Based on Distributed Local Intelligence
Provenance can play a significant role in an information system for supporting the calculation of information trust. A node’s trust can change over time after its initial deployment due to various reasons such as energy loss, environmental conditions or exhausting sources. We introduce a node-level trust-enhancing mechanism for information networks using provenance. A unique characteristic of t...
متن کاملTo Trust or Not to Trust? Developing Trusted Digital Spaces through Timely Reliable and Personalized Provenance
Organizations are increasingly dependent on data stored and processed by distributed, heterogeneous services to make critical, high-value decisions. However, these service-oriented computing environments are dynamic in nature and are becoming ever more complex systems of systems. In such evolving and dynamic eco-system infrastructures, knowing how data was derived is of significant importance i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014